AITopics | feedback message

Collaborating Authors

feedback message

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LILO: Bayesian Optimization with Interactive Natural Language Feedback

Kobalczyk, Katarzyna, Lin, Zhiyuan Jerry, Letham, Benjamin, Zhao, Zhuokai, Balandat, Maximilian, Bakshy, Eytan

arXiv.org Artificial IntelligenceOct-21-2025

For many real-world applications, feedback is essential in translating complex, nuanced, or subjective goals into quantifiable optimization objectives. We propose a language-in-the-loop framework that uses a large language model (LLM) to convert unstructured feedback in the form of natural language into scalar utilities to conduct BO over a numeric search space. Unlike preferential BO, which only accepts restricted feedback formats and requires customized models for each domain-specific problem, our approach leverages LLMs to turn varied types of textual feedback into consistent utility signals and to easily include flexible user priors without manual kernel design. At the same time, our method maintains the sample efficiency and principled uncertainty quantification of BO. We show that this hybrid method not only provides a more natural interface to the decision maker but also outperforms conventional BO baselines and LLM-only optimizers, particularly in feedback-limited regimes.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.17671

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ProToM: Promoting Prosocial Behaviour via Theory of Mind-Informed Feedback

Bortoletto, Matteo, Zhou, Yichao, Ying, Lance, Shu, Tianmin, Bulling, Andreas

arXiv.org Artificial IntelligenceSep-8-2025

While humans are inherently social creatures, the challenge of identifying when and how to assist and collaborate with others - particularly when pursuing independent goals - can hinder cooperation. To address this challenge, we aim to develop an AI system that provides useful feedback to promote prosocial behaviour - actions that benefit others, even when not directly aligned with one's own goals. We introduce Pro-T oM, a Theory of Mind-informed facilitator that promotes prosocial actions in multi-agent systems by providing targeted, context-sensitive feedback to individual agents. Pro-ToM first infers agents' goals using Bayesian inverse planning, then selects feedback to communicate by maximising expected utility, conditioned on the inferred goal distribution. We evaluate our approach against baselines in two multi-agent environments: Doors, Keys, and Gems, as well as Overcooked. Our results suggest that state-of-the-art large language and reasoning models fall short of communicating feedback that is both contextually grounded and well-timed - leading to higher communication overhead and task speedup. In contrast, ProToM provides targeted and helpful feedback, achieving a higher success rate, shorter task completion times, and is consistently preferred by human users.

agent, artificial intelligence, feedback message, (16 more...)

arXiv.org Artificial Intelligence

2509.05091

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Add feedback

Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Stein, Katharina, Hodel, Nils, Fišer, Daniel, Hoffmann, Jörg, Katz, Michael, Koller, Alexander

arXiv.org Artificial IntelligenceAug-20-2025

LLMs have recently been used to generate Python programs representing generalized plans in PDDL planning, i.e., plans that generalize across the tasks of a given PDDL domain. Previous work proposed a framework consisting of three steps: the LLM first generates a summary and then a strategy for the domain, both in natural language, and then implements that strategy as a Python program, that gets debugged on example planning tasks. In that work, only one strategy is generated and passed directly to the program generation. If the strategy is incorrect, its implementation will therefore result in an incorrect generalized plan. Here, we introduce an approach that generates the strategy in the form of pseudocode and enables automatic debugging of the pseudocode, hence allowing us to identify and fix errors prior to the generation of the generalized plan itself. Additionally, we extend the Python debugging phase with a reflection step prompting the LLM to pinpoint the reason for the observed plan failure. Finally, we take inspiration from LLM code generation to produce several program variants and pick the best one. Running experiments on 17 benchmark domains, we show that these extensions substantially improve (and never deteriorate) the quality of the generalized plans. In 12 of the domains, our best Python programs solve all tasks that can be generated with the respective instance generator.

generator, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.13876

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Workflow (0.93)
Overview (0.68)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reasoning and Sampling-Augmented MCQ Difficulty Prediction via LLMs

Feng, Wanyong, Tran, Peter, Sireci, Stephen, Lan, Andrew

arXiv.org Artificial IntelligenceMar-11-2025

The difficulty of multiple-choice questions (MCQs) is a crucial factor for educational assessments. Predicting MCQ difficulty is challenging since it requires understanding both the complexity of reaching the correct option and the plausibility of distractors, i.e., incorrect options. In this paper, we propose a novel, two-stage method to predict the difficulty of MCQs. First, to better estimate the complexity of each MCQ, we use large language models (LLMs) to augment the reasoning steps required to reach each option. We use not just the MCQ itself but also these reasoning steps as input to predict the difficulty. Second, to capture the plausibility of distractors, we sample knowledge levels from a distribution to account for variation among students responding to the MCQ. This setup, inspired by item response theory (IRT), enable us to estimate the likelihood of students selecting each (both correct and incorrect) option. We align these predictions with their ground truth values, using a Kullback-Leibler (KL) divergence-based regularization objective, and use estimated likelihoods to predict MCQ difficulty. We evaluate our method on two real-world \emph{math} MCQ and response datasets with ground truth difficulty values estimated using IRT. Experimental results show that our method outperforms all baselines, up to a 28.3\% reduction in mean squared error and a 34.6\% improvement in the coefficient of determination. We also qualitatively discuss how our novel method results in higher accuracy in predicting MCQ difficulty.

distractor, reasoning step, student, (13 more...)

arXiv.org Artificial Intelligence

2503.08551

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Durham > Durham (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses

Baral, Sami, Worden, Eamon, Lim, Wen-Chiang, Luo, Zhuang, Santorelli, Christopher, Gurung, Ashish, Heffernan, Neil

arXiv.org Artificial IntelligenceOct-29-2024

The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research has explored methodologies to enhance the effectiveness of feedback. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education. We examine the effectiveness of LLMs in evaluating student responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide both a quantitative score and qualitative feedback on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems. A similar approach was taken for training the SBERT model as well, while the GPT4 model used a zero-shot learning approach. We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers. The teachers utilized a shared rubric in assessing the accuracy and relevance of the generated feedback. We conduct both quantitative and qualitative analyses of the model performance. By offering a detailed comparison of these methods, this study aims to further the ongoing development of automated feedback systems and outlines potential future directions for leveraging generative LLMs to create more personalized learning experiences.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.0891

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.26)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (1.00)
Education > Curriculum > Subject-Specific Education (0.94)
Education > Educational Technology > Educational Software > Computer Based Training (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

McNichols, Hunter, Lee, Jaewook, Fancsali, Stephen, Ritter, Steve, Lan, Andrew

arXiv.org Artificial IntelligenceJul-8-2024

Intelligent Tutoring Systems (ITSs) often contain an automated feedback component, which provides a predefined feedback message to students when they detect a predefined error. To such a feedback component, we often resort to template-based approaches. These approaches require significant effort from human experts to detect a limited number of possible student errors and provide corresponding feedback. This limitation is exemplified in open-ended math questions, where there can be a large number of different incorrect errors. In our work, we examine the capabilities of large language models (LLMs) to generate feedback for open-ended math questions, similar to that of an established ITS that uses a template-based approach. We fine-tune both open-source and proprietary LLMs on real student responses and corresponding ITS-provided feedback. We measure the quality of the generated feedback using text similarity metrics. We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors. These results suggest that despite being able to learn the formatting of feedback, LLMs are not able to fully understand mathematical errors made by students.

experiment, feedback message, llm, (14 more...)

arXiv.org Artificial Intelligence

2405.06414

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (0.68)
Education > Educational Technology > Educational Software > Computer Based Training (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

Scarlatos, Alexander, Smith, Digory, Woodhead, Simon, Lan, Andrew

arXiv.org Artificial IntelligenceMar-2-2024

Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and LLM-generated feedback. Second, we propose a framework for feedback generation that optimizes both correctness and alignment using reinforcement learning (RL). Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO). We show that our methods significantly increase the correctness and alignment of generated feedback with Llama 2, an open-source LLM, qualitatively analyze our generation and evaluation systems using case studies, and outline several areas for future work.

feedback generation, gpt-4, student, (14 more...)

arXiv.org Artificial Intelligence

2403.01304

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

McNichols, Hunter, Feng, Wanyong, Lee, Jaewook, Scarlatos, Alexander, Smith, Digory, Woodhead, Simon, Lan, Andrew

arXiv.org Artificial IntelligenceJan-11-2024

Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable form of assessment. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using large language models. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we propose generative AI-based metrics for evaluating the quality of the feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation; based on these findings, we outline several directions for future work.

distractor, feedback message, metric, (16 more...)

arXiv.org Artificial Intelligence

2308.03234

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(8 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Education > Educational Technology > Educational Software (0.68)
Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Predicting Desirable Revisions of Evidence and Reasoning in Argumentative Writing

Afrin, Tazin, Litman, Diane

arXiv.org Artificial IntelligenceFeb-9-2023

We develop models to classify desirable evidence and desirable reasoning revisions in student argumentative writing. We explore two ways to improve classifier performance - using the essay context of the revision, and using the feedback students received before the revision. We perform both intrinsic and extrinsic evaluation for each of our models and report a qualitative analysis. Our results show that while a model using feedback information improves over a baseline model, models utilizing context - either alone or with feedback - are the most successful in identifying desirable revisions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.05039

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Africa > Kenya (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Education > Educational Setting > K-12 Education (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Can artificial intelligence encourage good behaviour among Internet users?

#artificialintelligenceSep-28-2020, 07:25:56 GMT

Hostile and hateful remarks are thick on the ground on social networks in spite of persistent efforts by Facebook, Twitter, Reddit and YouTube to tone them down. Now researchers at the OpenWeb platform have turned to artificial intelligence to moderate Internet users' comments before they are even posted. The method appears to be effective because one third of users modified the text of their comments when they received a nudge from the new system, which warned that what they had written might be perceived as offensive. The study conducted by OpenWeb and Perspective API analysed 400,000 comments that some 50,000 users were preparing to post on sites like AOL, Salon, Newsweek, RT and Sky Sports. Some of these users received a feedback message or nudge from a machine learning algorithm to the effect that the text they were preparing to post might be insulting, or against the rules for the forum they were using.

artificial intelligence, internet user, social media, (4 more...)

#artificialintelligence

Industry: Information Technology (0.38)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback